One-Level Cache Memory Design for Scalable SMT Architectures

نویسندگان

Muhamed F. Mudawar

John R. Wani

چکیده

The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment, as all the cache blocks in the upper level caches are contained in the lower level cache. It also has a complex design to maintain cache coherence across all levels. Furthermore, this cache hierarchy is not suitable for future large-scale SMT processors, which will demand high bandwidth instruction and data caches with a large number of ports. This paper suggests the elimination of the cache hierarchy and replacing it with one-level caches for instruction and data. Multiple instruction caches can be used in parallel to scale the instruction fetch bandwidth and capacity. A one-level data cache can be split into a number of block-interleaved cache banks to serve multiple memory requests in parallel. An interconnect will be required to connect the data cache ports to the different cache banks. The interconnect will increase the data cache access time. This paper shows that large-scale SMTs can tolerate longer data cache hit times. Increasing the data cache access time from 3 cycles to 5 cycles reduces the IPC by only 2.8%, and increasing it from 3 cycles to 7 cycles will reduce the IPC by 8.9%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design Challenges of Scalable Operating Systems for Many-Core Architectures

Computers will move from the multi-core reality of today to manycore. Instead of only a few cores on a chip, we will have thousands of cores available for use. This new architecture will force engineers to rethink OS design. It is the only way for operating systems to remain scalable even as the number of cores increases. Presented here are three design challenges of operating systems for many-...

متن کامل

Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors

Simultaneous Multithreading (SMT) has emerged as an effective method of increasing utilization of resources in modern super-scalar processors. SMT processors increase instruction-level parallelism (ILP) and resource utilization by simultaneously executing instructions from multiple independent threads. Although simultaneously sharing resources benefits system throughput, coscheduled threads oft...

متن کامل

Scalable directoryless shared memory coherence using execution migration

We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family of architectures. Migration-based architectures move threads among cores to guarantee sequential semantics in large multicores. Using a execution migration (EM) architecture, we achieve performance comparable to directory-based architectures without using directories: avoiding automatic data repl...

متن کامل

Predictable Fine-Grained Cache Behavior for Enhanced Simultaneous Multithreading (SMT) Scheduling

By converting thread-level parallelism to instruction level parallelism, Simultaneous Multithreaded (SMT) processors are emerging as effective ways to utilize the resources of modern superscalar architectures. However, the full potential of SMT has not yet been reached as most modern operating systems use existing single-thread or multiprocessor algorithms to schedule threads, neglecting conten...

متن کامل

Speculative Precomputation on Chip Multiprocessors

Previous work on speculative precomputation (SP) on simultaneous multithreaded (SMT) architectures has shown significant benefits. The SP techniques improve singlethreaded program performance by utilizing otherwise idle thread contexts to run “helper threads”, which prefetch critical data into shared caches and reduce the time the “main thread” stalls waiting for long latency outstanding loads....

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

One-Level Cache Memory Design for Scalable SMT Architectures

نویسندگان

چکیده

منابع مشابه

Design Challenges of Scalable Operating Systems for Many-Core Architectures

Understanding the Impact of Inter-Thread Cache Interference on ILP in Modern SMT Processors

Scalable directoryless shared memory coherence using execution migration

Predictable Fine-Grained Cache Behavior for Enhanced Simultaneous Multithreading (SMT) Scheduling

Speculative Precomputation on Chip Multiprocessors

عنوان ژورنال:

اشتراک گذاری